Joint Mathematics Meetings January 18, 2020

Modernizing

mod·ern·ize | ˈmädərˌnīz |

verb [with object]

adapt (something) to modern needs or habits, typically by installing modern equipment or adopting modern ideas or methods: a five-year plan to modernize Algerian agriculture.

Modernizing Needs

modern needs

  • prediction/classification
  • decision making/trade-offs
  • causality

pre-modern needs

  • detect some signal in noise: averaging
  • infer genetics from phenotypes: \(r\)
  • provide arbitration for claims: \(p\)

Modernizing Equipment

modern equipment

  • computers
  • databases
  • high connectivity

pre-modern equipment

  • mechanical calculators
  • static tables and tabulations
  • printing as distribution

Modernizing Methods

modern methods

  • wrangling & visualization
  • machine/statistical learning
  • randomization, bootstrapping, cross-validation
  •  
  •  
  • directed acyclic graphs
  • false discovery rates, …

pre-modern methods

  • see, e.g. most intro stats books
  • correlation/simple regression
  • histograms

Modernizing pedagogy

Many excellent recommendations from GAISE and other sources.


I have one more to add, that I think is critical:

Base teaching on what we know now, not on what was being invented in 1880-1910.

Ontogeny recapitulates phylogeny?

In biology: Growth and development of an individual follows the same path as the evolution of a species.

Should statistics teaching follow embryology?

In education: Should individual students follow the same path as statistics as a whole?

Bernoulli \(\Rightarrow\) Gauss \(\Rightarrow\)

Quetelet \(\Rightarrow\) Galton \(\Rightarrow\)

Pearson \(\Rightarrow\) Gosset \(\Rightarrow\)

Fisher \(\Rightarrow\)

Neyman-Pearson …

probability, means,

standard deviation,

correlation coefficient,

chi-squared,

t-test, “significant”, “p-value”, …

If statistics were automobiles … 1880s

1888 Francis Galton introduces the “co-relation” coefficient

1885 Karl Benz designs 4-stroke engine for use in his automobile

If statistics were automobiles … 1900–1910

1908 William Gossett’s t statistic

1908 First Model T off Henry Ford’s production line

If statistics were automobiles … 1920s

1927 Ford Model A enters production

1925 ANOVA appears in Fisher’s Statistical Methods for Research Workers

Where does following the historical path get us?

  • introductory course typically ends at or before “one-way” ANOVA
  • we neglect thinking constructively about causation
    • we end up like Fisher in 1959 arguing that smoking doesn’t cause cancer
  • we avoid problems of prediction, researcher degrees of freedom, false discovery, evaluating trade-offs

Detours on the path to data science

We dip into historical coves and specialized techniques

  • chi-squared, unequal variance t-test, one-tailed tests, histogram, stem-and-leaf, box-and-whisker
  • we use historical vocabulary that can be offputting or misleading
    • standard deviation, standard error, margin of error, significance

Proposal: Stats for Data Science

Let’s work backward from current needs: prediction, decision-making, causality.

Meet these needs without worrying about phylogeny and history.

A curriculum, with only the essentials:

  1. Data organization
  2. Graphics
  3. Models (generalize graphics to many variables)
    • present “hypothesis testing” as aiding decision making when model building
  4. … leaving time for covariates, causality, loss-functions & trade-offs, models that learn, …

Computing essentials

How can we make computing accessible to everyone, both practically and intellectually?

Practical: Browser-based applications, web apps

Intellectual: Define a small set of essential, high-level skills.

Essential, high-level computing skills

  1. Draw a point plot. Up to four variables: y, x, color, facet.
    • use jittering and transparency
  2. Construct a model: y ~ x + z and visualize it with (1).
    • allow flexibility
    • allow choice of architectures: machine-learning, bounded, unbounded.
  3. Evaluate a model at two different inputs: effect size
  4. Compare two models, e.g. y ~ 1 and y ~ 1 + x
    • cross-validated prediction error
    • F

One app can do all these things in the space of a smartphone.

1. Create a model

2. Evaluate and find effect size

3. Compare models

4. Inference for comparing models

A Compact Guide to Classical Inference

How to help instructors who are in a math environment where computing is deprecated and formulas are seen as the “real math”?

Being serialized at StatPREP.org

Short book for instructors showing

  1. How to unify all the inference settings into a single method
  2. … that doesn’t require any computation besides the app
  3. … that builds confidence since approximate results can be seen by eye, and
  4. exact results use 1 simple formula

Resources

  • StatPREP.org: Little Apps and Compact Guide

  • MAA mini-course in Stats for Data Science: dtkaplan.github.io/SDS-MAA-minicourse

  • Draft of more extensive textbook: dtkaplan.github.io/SDS-book

  • The prototype app in the slides: dtkaplan.shinyapps.io/LittleAppF